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Abstract 

X-ray crystallography is a powerful tool to determine the protein 3D structure. However, it 
is time-consuming and expensive, and not all proteins can be successfully crystallized, partic- 
ularly for membrane proteins. Although nuclear magnetic resonance (NMR) spectroscopy is 
indeed a very powerful tool in determining the 3D structures of membrane proteins, it is also 
time-consuming and costly. To the best of the authors' knowledge, there is little structural 
data available on the AGAAAAGA palindrome in the hydrophobic region (113-120) of prion 
proteins due to the noncrystalline and insoluble nature of the amyloid fibril, although many 
experimental studies have shown that this region has amyloid fibril forming properties and plays 
an important role in prion diseases. In view of this, the present study is devoted to address 
this problem from computational approaches such as global energy optimization, simulated 
annealing, and structural bioinformatics. The optimal atomic-resolution structures of prion 
AGAAAAGA amyloid fibils reported in this paper have a value to the scientific community in 
its drive to find treatments for prion diseases. 

Key words: Hybrid computational algorithms, optimizing molecular structures, molecular 
modeling. 
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1 Introduction 



Prion diseases include Creutzfeldt-Jakob disease, variant Creutzfeldt-Jakob diseases, 
Gerstmann-Strussler-Scheinker syndrome. Fatal Familial Insomnia, Kuru in humans, 
scrapie in sheep, bovine spongiform encephalopathy (or mad-cow disease) and chronic 
wasting disease in cattle. These diseases are invariably fatal and highly infectious neu- 
rodegenerative diseases affecting humans and animals. However, for threating all these 
diseases, there is no effective therapeutic approaches or medications [Aguzzi and Heiken- 
walder, 2006, Prusiner, 1998, Weissmann, 2004]. Prion diseases are caused by conver- 
sion of prion proteins from a soluble into an insoluble fibrillar form [Prusiner, 1982 



Prusiner, 1998] . The normal cellular prion protein (PrP*-") is rich in a- helices but the 



infectious prions (PrP^'^) are rich in /3-sheets [Griffith, 1967| . For PrP^, the N-terminal 
residues (1-123) are unstructured, but the C-terminal residues (124-231) are well struc- 
tured with three a- helices, two short anti-parallel /3-strands, and a buried disulfide bond 
between the 2nd and 3rd a-helix (residues number 179 and number 214). The prion pro- 
tein molecular structure and dynamics can be seen from Zhang, 2010[ [Zhang, 2011b 



Zhang, 2011c etc.. For PrP^'^, the rich /3-sheets formulate into prion amyloid fibrils. 



The hydrophobic region (113-120) AGAAAAGA palindrome of prion proteins falls 
just within the N-terminal unstructured region (1-123) of prion proteins which is hard 
to determine its molecular structure using NMR spectroscopy or X-ray crystallography 
[Riek et al., 1996] . However, many experimental studies such as [Brown, 2000, Brown, 
2001, Brown et al, 1994, Holscher et al., 1998, Jobling et al., 2001, Jobling et al, 1999, 
Kuwata et al., 2003, Norstrom and Mastrianni, 2005, Wegner et al., 2002] have shown 
that: (1) the hydrophobic region (113-120) AGAAAAGA of prion proteins plays an 
important role in the conversion of PrP^ to the abnormally folded form PrP^'^; and (2) 
AGAAAAGA is important for amyloid fibril formation and is an inhibitor of prion dis- 
eases. Zhang (2009) also confirmed through computer molecular dynamics simulations 
that the stability of prion proteins might be attributable mainly to the N-terminal 
unstructured region (1-123). Due to the noncrystalline and insoluble nature of the 
amyloid fibril, it is very difficult to obtain atomic-resolution structures of AGAAAAGA 



using traditional experimental methods [Tsai, 2005[ Zheng et al., 2006 . For the sake of 



clarity on computers again, we use the program used by [Zhang et al., 200"7] to theoret- 
ically confirm that prion (113-120) AGAAAAGA segment has an amyloid fibril forming 
property. The theoretical computation results are shown in Fig. 4 of [Zhang, 2011a[ , 
from which we can see that the prion AGAAAAGA region (113-120) is clearly identified 
as the amyloid fibril formation region because the energy is less than the amyloid fibril 



formation threshold energy of -26 KCal/mol [Zhang et al., 2007 . Thus, we got con 



fidence in constructing the atomic-resolution molecular structures of prion (113-120) 
AGAAAAGA amyloid fibrils by computer computational approaches or introducing 
novel mathematical formulations and physical concepts. 

Many studies have indicated that computational approaches or introducing novel 
mathematical formulations and physical concepts into molecular biology, such as Ma- 
halanobis distance Chou, 1995 Chou and Zhang, 1995[ , pseudo amino acid composi- 



tion [Ghou, 2001HGhou, 2011] , graphic rules [Andraos, 2008||Ghou, 1989b||Ghou, 1990 
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Chou, 20101 |Zhou and Deng, 1984| , complexity measure factor [Xiao et al., 2006b, Xiao 



et al., 2005b], homology modeling [Chou, 2004a| , cellular automaton [Xiao and Chou 
2007, Xiao et al., 2006a, Xiao et al., 2009, Xiao et al., 2005a], molecular docking 
[Chou et al., 2003] , grey theory [Xiao et al., 2008a] , geometric moments [Xiao et al., 
2008b], low- frequency (or Terahertz frequency) phonons [Chou, 1988[ |Chou, 1989a 



|Chou and Chen, 1977| , solitary wave [Chou et al., 1994| |Sinkala, 2006] , and surface 
diffusion-controlled reaction [Chou and Zhou, 1982 , can significantly stimulate the de- 
velopment of biological and medical science. Various computer computational ap- 
proaches were used to address the problems related to "amyloid fibril" [Carter and 
Chou, 1998, Chou, 2004b, Chou, 2004c, Chou and Howe, 2002, Wang et al., 2008, 
Wei et al., 2005]. Here, we would like to use the hybrid local and global optimization 
search methods to investigate the optimal atomic-resolution amyloid fibril models in 
hopes that the findings thus obtained may be of use for controlling prion diseases. Us- 
ing the traditional local search steepest descent and conjugate gradient [Li and Chen, 
2005, Sun and Zhang, 2001, Zhu and Chen, 2008] methods hybridized with the stan- 
dard global search simulated annealing method [Horst et al., 2003, Yiu et al., 2004], 
zhang (2011a) successfully constructed three optimal atomic-resolution structures of 
prion AGAAAAGA amyloid fibrils. These structures were constructed based on the 
breakthrough work of [Sawaya et al., 2007] . In [Zhang, 2011a] , the author pointed out 
that basing on the NNQNTF peptide of elk prion 173-178 (PDB entry 3FVA that was 
released into Protein Data Bank (http://www.rcsb.org) on 30-JUN-2009, deposition 
date 15-JAN-2009) we might also be able to construct amyloid fibril models for prion 
AGAAAAGA palindrome; this paper is doing this homology model construction work. 
The homology models were built using an improved hybrid Simulated Annealing (SA 
[Kirkpatrick et al., 1983| ) Discrete Gradient (DG [Bagirov et al., 2008t [B^girov, 2003| ) 
method. Then the models were optimized/solved using the traditional steepest de- 



scent (SD) and conjugate gradient (CG) local search methods of [Case et al., 2008 



the former has nice convergence but is slow when close to minimums and the latter is 
efficient but its gradient RMS and GMAX gradient do not have a good convergence 
[Case et al., 2008] , We used the SD method followed by the CG method to optimize our 
models. When the models could not be optimized further, we employed the standard 
global search SA method of [Case et al., 2008] to escape from the stationary point calcu- 
lated by the local search SD &: CG methods. Through the further refinement of SD and 
CG local search methods, at last two optimal models were successfully got. Numerical 
results in this paper show that the hybridization of local and global search optimiza- 
tion methods is very effective. X-ray crystallography finds the X-ray final structure of 
a protein, which usually need refinements using a SA protocol in order to produce a 
better structure; this paper also correctly illustrates the SA protocol of crystallography. 



X-ray crystallography is a powerful tool to determine the protein 3D structure. 
However, it is time-consuming and expensive, and not all proteins can be successfully 
crystallized, particularly for membrane proteins. Although NMR spectroscopy is in- 
deed a very powerful tool in determining the 3D structures of membrane proteins [Call, 
et al., 2010, Call, et al, 2006, Oxenoid and Chou, 2005, Pielak and Chou, 2010a, Pielak 
and Chou, 2010b, Pielak and Chou, 2011, Pielak et al., 2009, Schneh and Chou, 2008, 
Wang et al., 2009], it is also time-consuming and costly. To the best of the authors' 
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knowledge, there is little structural data available on the AGAAAAGA palindrome in 
the hydrophobic region (113-120) of prion proteins, although many experimental stud- 
ies have shown that this region has amyloid fibril forming properties. In view of this, 
the present study was devoted to address this problem from computational approaches 
such as global energy optimization [Chou and Scheraga, 1982 Chou et al., 1992] , sim- 
ulated annealing [Chou, 1992 Chou and Carlacci, 1991| , and structural bioinformat- 
ics [Chou , 2004c| . Thus, the optimization computational approaches (such as SADG, 
GADG, SDCG etc) presented in this paper are very necessary and important to study 
amyloid fibrils, nanotubes, etc.. The prion AGAAAAGA optimal atomic-resolution 
structures of this paper might have a value for finding treatments for prion diseases. 



2 The Optimization Model Building 

Recently the protein fibril structure of NNQNTF (173-178) segment from elk prion 
protein was released [Wihzius et al., 2009| . Its PDB entry ID is 3FVA in the Protein 
Data Bank. This fibril has six chains, belonging to Class 1 of |Sawaya et al., 2007 . 
The atomic structure is a steric zipper, with strong van der Waals (vdw) interactions 
between /3-sheets and hydrogen bonds to maintain the /3-strands. 



Basing on this steric zipper, two prion AGAAAAGA palindrome amyloid fibril mod- 
els - a six chains AAAAGA model (Model 1) and a six chains GAAAAG model (Model 
2) - will be successfully constructed. The minimum sequence necessary for fibril for- 
mation should be AGAAA, AAAGA, AGAAAA, GAAAAG, AAAAGA, AGAAAAG, 



GAAAAGA or AGAAAAGA [Zhang, 20Tla , which are important for fibril forma 



tion and are an inhibitor of PrP^*^ neurotoxicity [Brown, 2000 . Because the peptide 




NNQNTF has six residues and the six chains AGAAAA model could not successfully 
pass SA, AAAAGA and GAAAAG were picked out of the eight possible sequences. 
The D chain (i.e. /3-sheet 2) of 3FVA.pdb can be obtained from A Chain (i.e. /3-sheet 
1) using the mathematical formula 

-14.31482 

D = \ 1 1^+1 2.42 I . (1) 

-21.03096 

AD chains of Models 1-2 (Figures 1-2) were respectively got from AD chains of 3FVA.pdb 
using the mutate module of the free package Swiss-Pdb Viewer (SPDBV Version 4.01) 
Ohttp: / /spdbv.vital-it .chp , but the vdw contacts are too far, very bad at this moment 
(Figures 1-2). To get good vdw interactions will be an optimization problem described 
as follows. 

Neutral atoms are subject to two distinct forces in the limit of large distance and 
short distance: a dispersion force (i.e. attractive vdw force) at long ranges, and a 
repulsion force, the result of overlapping electron orbitals. The Lennard-Jones (L-J) 
potential represents this behavior (http://en.wikipedia.org/wiki/Lennard-Jones_potential^, 
or Locatelli M. et al. (2008) and references therein). The L-J potential is of the form 

V{r) = 4e [i^r - i^f] , (2) 
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Figure 1: Model 1 - bad vdw contacts of AD chains of the AAAAGA modeh 



where e is the depth of the potential well and a is the atom diameter; these parame- 
ters can be fitted to reproduce experimental data or deduced from results of accurate 
quantum chemistry calculations. The (^)^^ term describes repulsion and the (^)^ term 
describes attraction. If we introduce the coordinates of the atoms whose number is 
denoted by N and let e = a = 1 be the reduced units, the form ^ becomes 



fix) 




1 



where = (x3i_2 - X2.j-2f + [xm-i - a^sj-i)^ + {x?.i - x^^jf , (x3i_2, 3;3i__i, xsi^ 
coordinates of atom i, N > 2. The minimization of L-J potential f{x) on M" (where 
n = 3N) is an optimization problem: 



(3) 



is the 



min f(x). 



(4) 



For solving the optimization problem (4), many studies, for examples [Coleman 
et al., 1994, Doye, 1999, Huang et al., 2002, Huang and pardalos, 2002, Leary, 1997, 
Wolfe, 1975, Wolfe, 1976, Pardalos et al, 1994, Romero et al., 1999, Xue et al., 1992, 
Xue, 1993, Xue, 1994a, Xue, 1994b, Xue, 2002], have been done. A brief review on 
some results and computational algorithms before the year 2003 for solving (4) can be 
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Figure 2: Model 2 - bad vdw contacts of AD chains of the GAAAAG model. 



referred to Zhang, 2003 , for the year 2004 can be referred to [Xiang, et al. 2004a 



Xiang, et al. 2004b ; the website http://www-wales.ch.cam.ac.uk/index.html should 
not be ignored. Basing on these studies, we present the following two successful al- 
gorithms. 

Algorithm 1: Hybrid SADG method 
Initialization: 

Define the objective function f and its feasible solution space. 
Call the initial feasible solution generating procedure to get x. 
Call initial temperature selecting procedure to get T. 
Inialization of /: f ~ f{x). 

Initialize the neigbourhood feasible solution: x-ueighbour = 0. 
Initialize xJjest: xJjest — x. 
Initialize fjbest: f Jbest = f. 



do { 



DG local search part: 

fJ)estJocal — local_sea.rch.{xJ>est,x_new_gotten); 
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X = xjnew_gotten; 
SA global search part: 



do { 

do { 

xjneighhour = randomly jperturb{x) [Bagirov and Zhang, 2003] ; 

f jneighhour — f (xjneighhour); 

Calculate the difference A = f jneighhour — f; 

If (A < 0) or (random[0,l] < exp(-A/T)) 

X — xjneighhour f = f jneighhour; 
If (/ l£ f -best) xJbest — x f -best = /; 
} while (equilibrium has not been reached); 
Temperature annealing 
} while (SA stop criterion has not been met); 

} while ( f_best-f_bestJocal < -0.001 ); 

The local search DG method is efficient and effective but it is also always trapped in a 
local solution. SA is a global search method but sometimes just gets a global solution 
with low probability according to the Metropolis criterion. SA can make DG jump out 
of the local trap and then DG can make SA reach an optimal solution with the 100% 
full probability. The cons of Algorithm 1 might be the large numbers of iterations in 
SA when searching the global solution with low probability. 

Algorithm 2: Hybrid GADG (Genetic Algorithm | Forrest, 1993] DG) method 

Step 0. Set the seeding of the initial parental population. We set the cluster of 98 atoms as 
the base of the seed because the 98 atoms cluster has a tetrahedral structure. 

Step 1. Apply the discrete gradient method on all individuals of the initial parental population 
to relax them to their nearest local minimal energy positions. 

Step 2. Call the mating procedure of [Deaven et a I., 1995] to get the center of the mass of 
the parental population. Then set a random number for this mating procedure to mate more 
parents with each other. Thus the offspring population is produced. 

Step 3. Select from the parental population and offspring population to get the best combi- 
nation of a new population. Take the new population as parental population. 

Step 4- Run the Newton method (where the Hessian matrix is calculated explicitly) [Dennis et 
al., 1996] to relax all the individuals of parental population to local minimal energy positions. 

Step 5. Run discrete gradient method to refine the local minimum positions. The refined 
local minimal energy positions are set as the parental population. 

Step 6. Call the twinning mutations of [Wolf and Landman, 1998| . Then set a random 
number as the mutation rate for these mutation schemes to make mutations to the parental 
population. Then offspring population is produced. 

Step 7. Make a best combination of the parental population and the offspring population to 
get a new population. Take the new population as parental population. 

Step 8. Run the explicit Newton method to relax all the individuals of parental population 
to local minimal energy positions. 
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Step 9. Repeat Step 5. 
Step 10. Repeat Step 7. 

Step 11. If the algorithm reaches its convergence, then terminates, otherwise, goto Step 2. 

In Algorithm 2 the DG method and the exphcit Newton method are local search op- 
timization methods, and genetic algorithm is a global search method that brings the 
local solution to jump out of the trap. The cons of the Algorithm 2 lie in its initial 
solution specially chosen; whereas Algorithm 1 can start from any initial solution. The 
computational load of Algorithm 2 is very heavy, compared with that of Algorithm 1. 

SA and GA methods both are stochastic heuristic global search methods. But there 
are three basic differences between SA and GA: (1) SA simulates the annealing process 
of crystal materials with Monte Carlo property, GA simulates the process of natural 
competitive selection, crossover, and mutation of species; (2) GA is a parallel comput- 
ing (population) algorithm and SA algorithm is a sequential computing (individual) 
algorithm; (3) The choice of initial solution is different. In Step of Algorithm 2, 
we may choose clusters of 55, 75, 76, 85, 97, 98, or 147 atoms as the base of seed, 
because of their well-known structures of the best known solutions. For example, the 
98 atoms cluster has a tetrahedral structure; the 55 atoms cluster has an icosahedral 
structure; the cluster of 75 or 85 atoms has an octahedral structure; and the cluster of 
76 atoms has a decahedral structure. For 48 atoms, we use the structures of 13, 23, 19, 
10 atoms. In brief, all those 38 atoms face-centred-cubic (fee) truncated octahedron, 
75-77 and 102-104 atoms decahedron, and 98 atoms tetrahedron structures are based 
on polytetrahedral packing, which is structurally more similar to the icosahedral pack- 
ing of most Lennard-Jones clusters. Because of different background of SA and GA, 
the acceptance probability of solution is different very much. The computation load of 
GADG is clearly large than that of SADG. 



In Algorithm 1 the DG method Bagirov et al., 2008 is a derivative- free local search 



method for nonsmooth optimization with the continuous approximations to the Clarke 
subdifferential [Clarke, 1983 and the Demyanov-Rubinov quasidifferential [Demyanov 



and Rubinov, 2000], and the SA algorithm is using the neighborhood search procedure 



of Bagirov and Zhang, 2003 . The convergence of the proposed hybrid method directly 
follows from the convergence of SA and DG methods. The hybrid method starts from 
an initial point, first executes DG method to find local minimum, then carries on SA 
method in order to escape from this local minimum and to find a new starting point for 
the DG method. Then we again apply the DG method starting from the last point and 
so on until the sequence of the optimal objective function values gotten is convergent. 
Similarly for Algorithm 2, the DG and Newton methods are local search optimiza- 
tion methods, GA is global search optimization method, and the convergence of the 
hybrid GADG method simply follows from that of these three optimization methods. 
Algorithm 2 can successfully reproduce all the best L-J potential energy values known 
(|http://ph yschem.ox.ac.uk/~doye/jon/structures/LJ.html ) nearly up to 310 atoms and for- 
tunately some more precise energy values and better solution structures are got (Table 
1 and Figures 3-4), but it is not easy to be applied to our model construction work of 
this paper. 

In implementing Algorithm 1, we use T = 0.9*T as the temperature annealing 
schedule and the initial temperature is taken large enough according to the rule in 
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Table 1: Our precise L-J potential energy best values 



Number of atoms 


Our precise best value 


Best value known* 


39 


-180.033185202447 


-180.033185140508 


40 


-185.249838614238 


-185.249838598471 


42 


-196.277533506901 


-196.277533404920 


48 


-232.199531999140 


-232.199529316227 


55 


-279.248470461822 


-279.248470308143 


75 


-397.492330708363 


-397.492330681104 


76 


-402.894865906469 


-402.894865881675 


97 


-536.681382651509 


-536.681382483245 



" http://pliyscliem.ox.ac.uk/~doye/jon/structures/LJ.html, 





Figure 3: N=39, 40, 42, 48, 55, 75, 76, 97 (got by Algorithm 2) 
[Kirkpatrick et al., 1983] , where we take To=300 K. We restrict the number 



Figure 4: N=39, 40, 42, 48, 55, 75, 76, 97 (best structures known). 



for the outer procedure by 100 and number of iterations for the inner procedure by 1,000. 
The DG method part is terminated when the distance between the approximation to 
the subdifferential and origin is less than a given tolerance e > (e = 10"^). The 
proposed hybrid method fails to solve (4) when number of atoms > 20 (however, the 
AD Chains of Models 1-2 have 60/58 atoms). In order to reduce the number of local 
minima we suggest to approximate the function 

^(-) = ^(^^ - ^s) (5) 

(which is neither convex nor strictly the difference of two convex functions) by the 
following function [Zhang, 2004| : 

c/(r) = max(5i(T),min(52(r),5'3(r))) (6) 

where 5'i(t) is the piecewise linear approximation of the function yj(r) in segment (0, rg], 
52(7") is the piecewise linear approximation of this function over segment [ro,ri], and 



10 



finally 53 (r) is the piecewise linear approximation over [ri,b] and b is an enough large 
number. Here 

ro = v^, ri = 1/^277. (7) 

Such an approximation of the function f^r) allows us to remove many local minima of 
the Lennard-Jones potential function and to get a good approximation to the global 
minimum of the objective function / in problem (4). In numerical experiments we take 
6 = 16 and divide the segment [0.001, rp] into 100 segments, the segment [tq, ri] into 100 
segments and the [ri, 16] into 50 segments which allows one to get good approximations 
for the function (p{t). The replacement of the function (p{t) by the function g{T) makes 
the objective function nonsmooth. On the other side such a replacement significantly 
reduce the number of local minima. Since the discrete gradient method is a method of 
nonsmooth optimization the proposed hybrid method can be applied for solving this 
transformed problem. When solving the L-J problem (4), first we use the DG method 
with build-up technique to relax to an initial solution. Then we apply the hybrid 
method, with the above approximation for the objective function, to get another initial 
solution. Starting from this initial solution we again apply the derivative-free DG 
method and at last get the global solution. Results of numerical experiments (Table 
2) show that our techniques can effectively solve the L-J problem (4) when number of 
atoms is not greater than 310. For Model 1, seeing Figure 1 we may know that vdw 
interactions such as between 1D.CB-6A.0, 3D.CB-3A.CB, 6D.0-1A.CB, etc. should 
be maintained. Solving the optimization problem (4) can get A Chain and D Chain, 
where D Chain should have good vdw interactions with A Chain. Similarly for Model 
2, vdw interactions should be maintained between 3D.CB-3A.CB, etc. (Figure 2). AD 
Chains in all have 60/58 atoms. Thus, we may use the above improved hybrid SADG 
algorithm to easily get the optimal coordinates of AD Chains of Models 1-2, where D 
Chain has good vdw interactions with A Chain now (Figures 5-6). Other chains (i.e. 
/3-strands) of Models 1-2 are got from AD Chains by the parallelization of AD Chains. 
The initial structures of Models 1-2 are shown in Figures 7-8. 



3 Model Solving/ Optimization 

The L-J potential (2) energy of atoms vdw interactions is just a part of the total 
potential energy of a protein [Case et al., 2008 Locatelli and Schoen, 2008 : 



-E-total 



I^bonds ^r{r 



'eqj 



+ Edihedrals^[l+COs(n 
■vvdw 



<l>-i)] 



+ E 



R. 



electrostatic 



~^ ^-^H— bonds 



HQ] 



(8) 



The initial structures of Models 1-2 illuminated in Figures 7-8 are not the optimal 
structures with the lowest total potential energies. The initial structures also have no 
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Table 2: Our numerical results for the L-J Potential Problem 



Number of atoms 


Best value obtained 


Best value known* 


f9 


-72.659782 


-72.659782 


20 


-77.177043 


-77.177043 


21 


-81.684571 


-81.684571 


22 


-86.573675 


-86.809782 


23 


-92.844461 


-92.844472 


24 


-97.348815 


-97.348815 


25 


-102.372663 


-102.372663 


27 


-112.825517 


-112.873584 


30 


-128.096960 


-128.286571 


34 


-150.044528 


-150.044528 


44 


-207.631655 


-207.688728 


49 


-239.091863 


-239.091864 


56 


-283.324945 


-283.643105 


65 


-334.014007 


-334.971532 


67 


-347.053308 


-347.252007 


84 


-452.267210 


-452.6573 


93 


-510.653123 


-510.8779 


148 


-881.072948 


-881.072971 


170 


-1,024.791771 


-1,024.791797 


172 


-1,039.154878 


-1,039.154907 


268 


-1,706.182547 


-1,706.182605 


288 


-1,850.010789 


-1,850.010842 


293 


-1,888.427022 


-1,888.427400 


298 


-1,927.638727 


-1,927.638785 


300 


-1,942.106181 


-1,942.106775 


301 


-1,949.340973 


-1,949.341015 


304 


-1,971.044089 


-1,971.044144 


308 


-1,999.983235 


-1,999.983300 



pttp://physchem. ox.ac.uk/~doye/jon/structures/LJ.htmlf 



hydrogen atoms (so no hydrogen bonds existed) and water molecules added. For each 
Chain, the C-terminal and N-terminal atoms also have problems. Clearly there are a 
lot of close/bad contacts between /3-strand atoms as illuminated in Figures 7-8. Thus, 
we still use the hybrid techniques of SD, CG, SA optimization methods within AMBER 
Case et al., 2008} |Zhang, 2011a| to optimize the above Models 1-2 in order to get the 
most stable structures. Each of the most stable structures will have its lowest total 
potential energy, i.e. 

min^total- (9) 

We used the ff03 force field of AMBER 10, in a neutral pH environment. The 
amyloid fibrils were surrounded with a 8 angstrom layer of TIP3PB0X water molecules 
using the XLEaP module of AMBER 10. 1,360, 1,372 waters and 180, 168 hydrogen 
atoms were added separately for Models 1-2 by the XLEaP module. The solvated 
amyloid fibrils were minimized by the SD method and then the CG method were 
performed (OPTl). Model 1 were optimized by 95,016 steps of SD and 27,751 steps of 
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Figure 5: Model 1 - good vdw interactions of AD chains of the AAAAGA model. 



CG; Model 2 by 95,016 steps of SD and 24,418 steps of CO. Then the solvated amyloid 
fibrils were quickly heated from K to 300 K linearly during 20 ps. The systems were 
kept at 300 K for 80 ps. The systems then were slowly cooled from 300 K to 100 K 
linearly for 400 ps. At 100 K, the systems were kept for 100 ps, and then for 4,400 
ps until the systems reach sufficient equilibration at 100 K (the RMSD, PRESS, and 
VOLUME (DENSITY) were sufficiently stable though their variations are very large). 
The SA NDER (Simulated Annealing with NMR-Derived Energy Restraints) algorithm 
with nonbonded cutoffs of 12 angstroms were used during the heating, cooling and 
the 100 ps at 100 K. Step size is 2 fs for the whole SA runs. During the SA, the 
Metropolis criterion was used. After the SA, the models were refined by SD and CG 
methods again (0PT2), Model 1 was refined by 20,000 steps of SD and 597 steps of CG, 
and 20,000 steps of SD and 1,921 steps of CG for Model 2. All the above works were 
performed on the Tango facilities of the Victorian Partnership for Advanced Computing 
(http://www.vpac.org) of Australia. 



4 Results and Discussion 

Figures 9-10 show the potential energy development for the two Models (where the 
0PT1-SA-0PT2 of AMBER 10 were used to generate the potential energy and the 
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Figure 6: Model 2 - good vdw interactions of AD chains of the GAAAAG model. 



Figures were drawn with XMGRACE of Grace 5.1.21). We can see that the potential 
energy goes down during the SD and CG optimization phase OPTl, suddenly drops 
down and quickly goes up and then slowly goes down and levels off during the SA 
phase, and at last quickly goes down and then levels off during a short phase of 0PT2. 
At the beginning of SA, the energy quickly drops off is due to the temperatures of the 



systems being suddenly changed from 100 K Wiltzius et al., 2009 to K. This is a 



case of so called "quenching". Some energy values are listed in Table 3. In Table 3, 
the first column of energies (OPTl Ist-step) are the ones of the initial structures of 
Models 1-2. The distance between /S-strands is too short for the vdw contacts so that 
Amber 10 cannot show the large L-J potential values (in Table 3 Column 1). This 
also implies the initial structures (Figures 7-8) are far from their optimal structures. 
OPTl removes these bad vdw and hydrogen bond contacts and makes the structures 
become much better with lower potential energies. However, OPTl is a local search 
optimization method which cannot thoroughly optimize the models into their most 
stable structures. 

In Figures 9-10 we see that models are trapped into their local optimal structures. 
SA is a global search optimization method that can make OPTl jump out of the local 
trap, even accepting very bad cases with low probability according to the Metropolis 
criterion. Thus, in Table 3 we see that SA rapidly quenches the molecular struc- 
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Figure 7: Model 1 - initial structure of prion AAAAGA amyloid fibril. 



Table 3: Potential energy values 



Model 


OPTl Ist-stop 


OPTl 50th-stcp 


OPTl last-step 


SA Ist-stcp 


SA Last 10,000-steps 
{average value) 


OPT2 Ist-step 


OPT2 last-step 


1 

vdw 


5.0294E+12 


117,540 
36,804.4917 


-18,179 


-19,583.8427 


-17,074.54472 


-17,053 


-19,482 


2 

vdw 


1.8427E+16 


419,080 
232,373.1312 


-18,564 


-19,992.2174 


-17,461.46262 


-17,474 


-19,887 



tures, allowing escape from the local traps; SA finally results in the loss of 2,509.29798 
KCal/mol, 2,530.75478 KCal/mol for the two systems, respectively. 

After SA, 0PT2 can safely bring the molecular structures of the models to the 
most stable states. 0PT2 makes the molecules in Models 12 lose 2,429 KCal/mol, 
2,413 KCal/mol of potential energy, respectively. 0PT2 results in a loss of energy from 
Models 12 of nearly the same magnitude as that of SA (i.e. the decrease in energy in 
0PT2 is significant compared to the change of energy in SA between the 1st step and 
the average of the last 10,000 steps). OPTl could not make further optimization, but 
0PT2 could make further optimization after SA; this demonstrates the effectiveness of 
SA (shown in Table 3 by comparing the values of 0PT2 last-step with OPTl last-step). 
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Figure 8: Model 2 - initial structure of prion GAAAAG amyloid fibril. 



The final optimal molecular structures of Models 1-2 after 0PT2 are shown in Fig- 
ures 11-12, where the snapshots after 0PT2 were drawn by the free package Molecular 
Visuahsation & Modeling (MVM) ( [hFtp : / / www . zmmsoft . com / [ ) . The RMSDs (root 
mean square deviations) from the initial structures shown in Figures 7-8 (where the 
initial structures were drawn by MVM and were generated by the improved SADG Al- 
gorithm 1) are 2.71, 2.95 angstroms, respectively, for Models 1-2. The hydrogen bonds 
between the two closet adjacent /3-strands and the vdw contacts between the two in- 
ner closest adjacent alanines can be seen in Figures 11-12. In both models, there is 
about 5 angstroms between the two closet adjacent /3-sheets, maintained by hydropho- 
bic bonds, and about 4.5 angstroms between the two closet adjacent /3-strands, which 
are linked by hydrogen bonds such as Ala2.H-Ala7.0, Ala6.H-Glyll.O for Model 1 and 
Ala2.0-Ala8.H, Ala21.H-Ala32.0 for Model 2. There is a hydrophobic core in each of 
the models. These amyloid fibrils are rich in ^-sheet structure and contain a cross-/? 
core form of infectious prions, which causes prion diseases. 

Numerical results of this paper showed that a six chains AGAAAA model could 
not successfully pass SA. However, two prion AGAAAAGA palindrome amyloid fibril 
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Figure 9: Potential energy of Model 1. 
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Figure 10: Potential energy of Model 2. 



models - a six chains AAAAGA model (Model 1) and a six chains GAAAAG model 
(Model 2) - were successfully passing 0PT1-SA-0PT2 and got at last. 
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Figure 11: Model 1 - optimal structure of prion AAAAGA amyloid fibril. 

5 Conclusion 

In recent years large-scale global optimization (GO) problems have drawn considerable 
attention. These problems have many applications, in particular in biochemistry and 
data mining. Numerical methods for GO are often very time consuming and could not 
be applied for high-dimensional non-convex and/or non-smooth optimization problems. 
The study of new algorithms which allow one to solve large-scale GO problem is very 
important. One technique is to use hybrid of global and local/global search algorithms. 
This paper presents two hybrid methods for solving the large-scale L-J potential GO 
problem. The methods do not guarantee the calculation of a global solution; however 
results of numerical experiments show that they, as a rule, calculate a solution which is 
global one or close to it. The improved hybrid SADG method can be successfully applied 
to the construction work of optimal atomic-resolution structures of prion AGAAAAGA 
amyloid fibrils. As the three models constructed for amyloid fibrils in [Zhang, 201 la| , 
the two amyloid fibril models gained in this paper may be useful in furthering the goals 
of medicinal chemistry. 
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Figure 12: Model 2 - optimal structure of prion AAAAGA amyloid fibril. 
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